# STYL 10 Year Replication

Replication package for "Cognitive Behavior Therapy Reduces Crime and Violence over 10 Years: Experimental Evidence"

## Version History

The data and code in this deposit have been updated after publication of the article. Modifications do not affect the tables and figures in the paper.

- **V1**: Original version
- **V2**: The following datasets were de-identified. The code files below were edited to comment out commands that drop these variables. The results remain unchanged.

**Datasets de-identified:**

- `data/clean/` (all files)
- `data/prev/` (all files except `SDV_Data.csv` and `STYL_attrit.dta`)
- `data/raw/` (all files)

**Code files edited for de-identification:**

- `dofiles/preindex/STYL_longrun_check.do`
- `dofiles/preindex/STYL_longrun_merge.do`
- `dofiles/preindex/STYL_longrun_prep.do`

## Computational Requirements

- Stata 14 or later (other versions should be able to run the code)
- At least 430 MB of free disk space to download and process the data
- Wall-clock time: less than 1 hour on a MacBook Pro
- Internet connection (for initial package installation only)

## Quick Start Checklist

Before running the replication, verify:

- [ ] All 25 required data files are in place (see [Required Data Files](#required-data-files))
- [ ] `$direc` path is set correctly in `master_10Yrep.do`
- [ ] Required Stata packages are installed (run `STYL_10Yrep_packages.do` first, or let master file install them)
- [ ] `outfiles/tables/` and `outfiles/figures/` folders exist
- [ ] `AER2017.xlsx` exists in `outfiles/`

## Instructions

To run the replication package:

1. Adjust the global `$direc` in `master_10Yrep.do` to point to your local directory

    ```stata
    * Set global directory
    global  direc  "/PATH/TO/REPLICATION/FOLDER/"
    ```

    **Note:** Include the trailing slash. Use forward slashes even on Windows.

2. Run `master_10Yrep.do`

This will:

- Populate the `outfiles/figures/` and `outfiles/tables/` folders
- Update the "t9 raw" sheet of `AER2017.xlsx` (Table 3 data)

## Required Data Files

The following **25 data files** are required to run the replication. These must be placed in the appropriate folders before running the code.

### Raw Survey Data (data/raw/)

| File | Description |
| ------ | ------------- |
| `styl_endline_clean_combined.dta` | 10-year endline survey data (rounds 7-8) |
| `styl_round2_clean_combined.dta` | Round 2 survey data |
| `styl_outcomes.dta` | Tracking and outcome data (attrition, mortality status) |
| `deaths_followup_2022/STLY Deaths Followup.dta` | Death validation follow-up data |
| `deaths_followup_2022/Pilot STLY Deaths Followup.dta` | Pilot death validation data |

### Previous Round Data (data/prev/)

**Phase 1 (Pilot) datasets:**

| File | Description |
| ------ | ------------- |
| `p1_baseline_clean.dta` | Phase 1 baseline survey |
| `p1_end3wks_indiv_clean.dta` | Phase 1: 3 weeks follow-up |
| `p1_end5m_clean.dta` | Phase 1: 5 months follow-up |
| `p1_e7m_clean.dta` | Phase 1: 7 months follow-up |
| `p1_e12m_clean.dta` | Phase 1: 12 months follow-up |
| `p1_e13m_clean.dta` | Phase 1: 13 months follow-up |

**Phase 2 datasets:**

| File | Description |
| ------ | ------------- |
| `p2_baseline_clean.dta` | Phase 2 baseline survey |
| `p2_end2wks_clean.dta` | Phase 2: 2 weeks follow-up |
| `p2_e2w2w_clean.dta` | Phase 2: 2w2w follow-up |
| `p2_end12m_clean.dta` | Phase 2: 12 months follow-up |
| `p2_end13m_clean.dta` | Phase 2: 13 months follow-up |

**Phase 3 datasets:**

| File | Description |
| ------ | ------------- |
| `p3_baseline_clean.dta` | Phase 3 baseline survey |
| `p3_end2w_clean.dta` | Phase 3: 2 weeks follow-up |
| `p3_end5w_clean.dta` | Phase 3: 5 weeks follow-up |
| `p3_end12m_clean.dta` | Phase 3: 12 months follow-up |
| `p3_end13m_clean.dta` | Phase 3: 13 months follow-up |

**Administrative and tracking datasets:**

| File | Description |
| ------ | ------------- |
| `STYL_adm.dta` | Administrative data |
| `STYL_attrit.dta` | Attrition tracking data |
| `SDV_Data.csv` | Self-reported sensitive behavior validation data (for Table 3) |

### Excel Template (outfiles/)

| File | Description |
| ------ | ------------- |
| `AER2017_template.xlsx` | Excel template that gets populated with Table 3 data |

## Optional Pre-Generated Files (data/clean/)

The following intermediate files are **generated by the code** during execution. They do not need to be distributed but can be pre-generated to speed up replication:

| File | Description |
| ------ | ------------- |
| `STYL_10yrs_clean.dta` | Cleaned 10-year endline data |
| `STYL_Deaths.dta` | Validated death records |
| `STYL_all_attrition.dta` | Attrition indicators by round |
| `STYL_updated_deaths.dta` | Updated death corrections |
| `STYL_lr_panel.dta` | Full panel merge (all phases/rounds) |
| `STYL_lr_panel_preindex.dta` | Panel with harmonized categorical variables |
| `STYL_lr_panel_analysis.dta` | Final analysis dataset with all indices |

## Do File Structure

### Key Files (dofiles/)

| File | Description |
| ------ | ------------- |
| `master_10Y_rep.do` | **Master script** - runs all other files in correct order |
| `STYL_10Yrep_packages.do` | Installs required Stata packages (ftools, reghdfe, coefplot, etc.) |
| `STYL_longrun_globals.do` | Sets up global macros used throughout the code |
| `STYL_longrun_label.do` | Labels variables in the analysis dataset |

### Folder Summaries

| Folder | Files | Description |
| -------- | ------- | ------------- |
| `preindex/` | 6 files | Data cleaning, tracking, and merging across phases |
| `construction/` | 26 files | Builds outcome indices (antisocial behavior, mental health, etc.) |
| `functions/` | 8 files | Reusable helper functions for table creation and regression output |
| `analysis/` | 3 files | Final analysis scripts that generate tables and figures |

### Execution Flow

The master file executes scripts in four phases:

1. **Clean Survey Data**: Merges raw data into panel format → `STYL_lr_panel.dta`
2. **Harmonize Variables**: Standardizes categorical variables → `STYL_lr_panel_preindex.dta`
3. **Build Indices**: Constructs outcome measures → `STYL_lr_panel_analysis.dta`
4. **Analysis**: Generates all tables and figures → `outfiles/`

## Expected Outputs

### Tables (outfiles/tables/)

The code generates 19 LaTeX table files:

| Output File | Paper Reference |
| ------------- | ----------------- |
| `Table1_main_plus_attr.tex` | Table 1: Ten-Year Impacts on Antisocial Behaviors and Mortality |
| `Table2_LTvs10Y.tex` | Table 2: One- versus Ten-Year Impacts on Antisocial Behaviors and Secondary Outcomes |
| `TableA1_random_block.tex` | Table A.1: Study sample and treatment assignment by randomization block |
| `TableA2_balance.tex` | Table A.2: Summary statistics and randomization balance, 10-year surveyed sample only |
| `TableA3_attrition_death.tex` | Table A.3: Attrition balance by treatment arm and baseline covariates |
| `TableA4_attrition_x_base.tex` | Table A.4: Attrition balance interacting treatment arm and baseline covariates |
| `TableB1_main_robustness.tex` | Table B.1: Robustness to different covariates and alternative index construction |
| `TableB2_asb_attbounds.tex` | Table B.2: Attrition bound estimates for antisocial behaviors |
| `TableB3_het_c_appendix.tex` | Table B.3: Heterogeneity in program impacts by baseline antisocial behavior |
| `TableB4_death_tab_attr.tex` | Table B.4: Cause of death for mortality before the 10-year survey |
| `TableB5_death_tab_rprt.tex` | Table B.5: Cause of death for mortality before, during, and after 10-year survey |
| `TableB6_index_timep.tex` | Table B.6: Program impacts on components of the time preferences index |
| `TableB7_index_selfc.tex` | Table B.7: Program impacts on components of the self control index |
| `TableB8_index_identi.tex` | Table B.8: Program impacts on components of the identity and values index |
| `TableB9_index_mental.tex` | Table B.9: Program impacts on components of the mental health index |
| `TableB10_index_abuse.tex` | Table B.10: Program impacts on components of the substance abuse index |
| `TableB11_index_famnet.tex` | Table B.11: Program impacts on components of the social networks index |
| `TableB12_index_econ.tex` | Table B.12: Program impacts on components of the economic performance index |
| `TableB13_mediation_analysis.tex` | Table B.13: Correlation between secondary outcomes and antisocial-behaviors index |

### Figures (outfiles/figures/)

| Output File | Paper Reference |
| ------------- | ----------------- |
| `Figure1_Level_ATE.pdf` | Figure 1: Average treatment effects |
| `Figure2_ATE_het_ASB.pdf` | Figure 2: Heterogeneous effects by baseline antisocial behavior |

### Excel Output (outfiles/)

| Output File | Description |
| ------------- | ------------- |
| `AER2017.xlsx` | Table 3 data (updates "t9 raw" sheet with SDV validation results) |

## Data Overview

### Raw Data

The `styl_endline_clean_combined.dta` and `styl_round2_clean_combined.dta` files are SurveyCTO exports for the 10-year follow-up (rounds 7 and 8 respectively).

The `styl_outcomes.dta` file contains tracking information for each individual. Some corrections for particular cases are included in `STYL_longrun_tracking.do`.

The `deaths_followup_2022/` folder contains validated death information from follow-up verification efforts.

### Previous Round Data

Despite being labeled "clean," these are raw files from each data collection round where variables are initially coded as strings. The naming convention is maintained for internal consistency.

The `SDV_Data.csv` file contains self-reported sensitive behavior data used for the validation exercise reported in Table 3.

## Panel Structure

The main analysis dataset `STYL_lr_panel_analysis.dta` is a panel defined by:

- **partid**: Participant ID
- **round**: Survey round (1-8)
- **phase**: Recruitment phase (1=Pilot/100 people, 2=398 people, 3=501 people)

### Survey Rounds

| Round | Timing | Notes |
| ------- | -------- | ------- |
| Baseline | Pre-treatment | Variables with `_b` suffix |
| 1 | 2 weeks post-treatment | Variables with `_e` suffix when round==1 |
| 2 | 4 weeks post-treatment | Phase 2 & 3 only; `_e` suffix when round==2 |
| 3 | 5 months post-treatment | Phase 1 only |
| 4 | 7 months post-treatment | Phase 1 only |
| 5 | 12 months post-treatment | All phases |
| 6 | 13 months post-treatment | All phases |
| 7 | ~114 months (9.5 years) | All phases |
| 8 | ~115 months (10 years) | All phases |

### Variable Suffixes

| Suffix | Description |
| -------- | ------------- |
| `_b` | Baseline value |
| `_e` | Endline value (round-specific) |
| `_stav` | Short-term average (rounds 1-2) |
| `_ltav` | Long-term average (rounds 5-6) |
| `_tyav` | Ten-year average (rounds 7-8) |

## Package Dependencies

The following Stata packages are required:

| Package | Purpose | Installation |
| --------- | --------- | -------------- |
| `ftools` | Fast data manipulation | `ssc install ftools` |
| `reghdfe` | High-dimensional fixed effects | `ssc install reghdfe` |
| `coefplot` | Coefficient plots | `ssc install coefplot` |
| `astile` | Fast percentile calculation | `ssc install astile` |
| `outreg` | Table output formatting | `ssc install outreg` |
| `mdesc` | Missing data description | `ssc install mdesc` |
| `fsum` | Fast summary statistics | `ssc install fsum` |
| `dm74` | Date utilities | `net install dm74, from("http://www.stata.com/stb/stb52/")` |

**Note:** The `dm74` package uses an older HTTP URL. If installation fails, this package may not be critical for replication.

## Additional Documentation

For detailed documentation, see the `docs/` folder:

- **[docs/CODEBOOK.md](docs/CODEBOOK.md)** - Variable naming conventions, global macros reference, data transformations
- **[docs/FUNCTIONS.md](docs/FUNCTIONS.md)** - Documentation for custom Stata functions

## Data and Code Availability Statement

The data sources used in the replication package can be found in the references. For Table 3, we used Table 9 from Blattman et al. (2017). The replication package for that paper can also be found in the references.

## References

Blattman, Christopher, Julian C. Jamison, and Margaret Sheridan. "Reducing Crime and Violence: Experimental Evidence from Cognitive Behavioral Therapy in Liberia." American Economic Review 107, no. 4 (April 2017): 1165–1206. <https://doi.org/10.1257/aer.20150503>.

Christopher Blattman, Sebastian Chaskel, Julian C. Jamison & Margaret Sheridan. 2022. "Replication data for: Cognitive Behavior Therapy Reduces Crime and Violence over 10 Years: Experimental Evidence" American Economic Review. AEA Data and Code Repository.

Christopher Blattman, Sebastian Chaskel, Julian C. Jamison & Margaret Sheridan. 2022. "The Long Run Effects of a Program Aimed At Reducing Crime and Violence: Experimental Evidence from Cognitive Behavioral Therapy in Liberia". AEA RCT Registry. November 19. DOI: <https://doi.org/10.1257/rct.6736-1.1>
